Reward Hierarchical Temporal Memory Model for Memorizing and Computing Reward Prediction Error by Neocortex

نویسندگان

Hansol Choi

Jun-Cheol Park

Jae Hyun Lim

Jae Young Jun

Dae-Shik Kim

چکیده

In humans and animals, reward prediction error encoded by dopamine systems is thought to be important in the temporal difference learning class of reinforcement learning (RL). With RL algorithms, many brain models have described the function of dopamine and related areas, including the basal ganglia and frontal cortex. In spite of this importance, how the reward prediction error itself is computed is not understood well, including the problem of how the current states are assigned to a memorized states and how the values of the states are memorized. In this paper, we describe a neocortical model for memorizing state space and computing reward prediction error, known as ‘reward hierarchical temporal memory’ (rHTM). In this model, the temporal relationships among events are hierarchically stored. Using this memory, rHTM computes reward prediction errors by associating the memorized sequences to rewards and inhibits the predicted reward. In a simulation, our model behaved similarly to dopaminergic neurons. We suggest that our model can provide a hypothetical framework of interaction between cortex and dopamine neurons. Keywords-component; reward; reinforcement learning; reward prediction error; temporal difference; HTM; rHTM;

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Adaptive Behaviour within a Memory-Prediction Framework

The Memory-Prediction Framework (MPF) and its Hierarchical-Temporal Memory implementation (HTM) have been widely applied to unsupervised learning problems, for both classification and prediction. To date, there has been no attempt to incorporate MPF/HTM in reinforcement learning or other adaptive systems; that is, to use knowledge embodied within the hierarchy to control a system, or to generat...

متن کامل

Trial-by-Trial Modulation of Associative Memory Formation by Reward Prediction Error and Reward Anticipation as Revealed by a Biologically Plausible Computational Model

Anticipation and delivery of rewards improves memory formation, but little effort has been made to disentangle their respective contributions to memory enhancement. Moreover, it has been suggested that the effects of reward on memory are mediated by dopaminergic influences on hippocampal plasticity. Yet, evidence linking memory improvements to actual reward computations reflected in the activit...

متن کامل

Reward prediction error signals by reticular formation neurons.

As a key part of the brain's reward system, midbrain dopamine neurons are thought to generate signals that reflect errors in the prediction of reward. However, recent evidence suggests that "upstream" brain areas may make important contributions to the generation of prediction error signals. To address this issue, we recorded neural activity in midbrain reticular formation (MRNm) while rats per...

متن کامل

Updating dopamine reward signals

Recent work has advanced our knowledge of phasic dopamine reward prediction error signals. The error signal is bidirectional, reflects well the higher order prediction error described by temporal difference learning models, is compatible with model-free and model-based reinforcement learning, reports the subjective rather than physical reward value during temporal discounting and reflects subje...

متن کامل

Early and late consolidation and reconsolidation of memory in the prelimbic cortex

Rats can learn to forage among olfactory cues to associate one with reward in only 3 massed trials. The learning is achieved in less than 10 min and results in a memory trace lasting at least 1wk week. To study the neuro-anatomical circuits involved in the memory formation we used immunoreactivity to the immediate early gene c-fos as a marker for neuronal activity induced by the learning. The p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Reward Hierarchical Temporal Memory Model for Memorizing and Computing Reward Prediction Error by Neocortex

نویسندگان

چکیده

منابع مشابه

Generating Adaptive Behaviour within a Memory-Prediction Framework

Trial-by-Trial Modulation of Associative Memory Formation by Reward Prediction Error and Reward Anticipation as Revealed by a Biologically Plausible Computational Model

Reward prediction error signals by reticular formation neurons.

Updating dopamine reward signals

Early and late consolidation and reconsolidation of memory in the prelimbic cortex

عنوان ژورنال:

اشتراک گذاری